Overview

Dataset statistics

Number of variables14
Number of observations3314
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory294.6 KiB
Average record size in memory91.0 B

Variable types

Numeric8
Categorical3
Boolean3

Alerts

LargestPropertyUseTypeGFA is highly overall correlated with TotalGHGEmissions and 4 other fieldsHigh correlation
TotalGHGEmissions is highly overall correlated with LargestPropertyUseTypeGFA and 4 other fieldsHigh correlation
SiteEnergyUse_kBtu_ is highly overall correlated with LargestPropertyUseTypeGFA and 4 other fieldsHigh correlation
LargestPropertyUseTypeGFA_log is highly overall correlated with LargestPropertyUseTypeGFA and 4 other fieldsHigh correlation
TotalGHGEmissions_log is highly overall correlated with LargestPropertyUseTypeGFA and 5 other fieldsHigh correlation
SiteEnergyUse_kBtu_log is highly overall correlated with LargestPropertyUseTypeGFA and 4 other fieldsHigh correlation
BuildingType is highly overall correlated with PrimaryPropertyTypeHigh correlation
Have_NaturalGas_Energy is highly overall correlated with TotalGHGEmissions_logHigh correlation
PrimaryPropertyType is highly overall correlated with BuildingTypeHigh correlation
Have_Stream_Energy is highly imbalanced (76.3%)Imbalance
Have_Electricity_Energy is highly imbalanced (99.6%)Imbalance
NumberofBuildings is highly skewed (γ1 = 43.11529862)Skewed
LargestPropertyUseTypeGFA is highly skewed (γ1 = 30.22421764)Skewed
SiteEnergyUse_kBtu_ is highly skewed (γ1 = 24.69850505)Skewed
SiteEnergyUse_kBtu_ has unique valuesUnique
SiteEnergyUse_kBtu_log has unique valuesUnique
NumberofBuildings has 92 (2.8%) zerosZeros

Reproduction

Analysis started2023-06-20 13:11:24.563326
Analysis finished2023-06-20 13:11:58.211299
Duration33.65 seconds
Software versionydata-profiling vv4.2.0
Download configurationconfig.json

Variables

YearBuilt
Real number (ℝ)

Distinct113
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1968.6976
Minimum1900
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size26.0 KiB
2023-06-20T15:11:58.634217image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1900
5-th percentile1908
Q11948
median1975
Q31997
95-th percentile2012
Maximum2015
Range115
Interquartile range (IQR)49

Descriptive statistics

Standard deviation33.059519
Coefficient of variation (CV)0.016792583
Kurtosis-0.86857772
Mean1968.6976
Median Absolute Deviation (MAD)24
Skewness-0.54205395
Sum6524264
Variance1092.9318
MonotonicityNot monotonic
2023-06-20T15:11:59.160830image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2000 71
 
2.1%
2014 67
 
2.0%
2008 64
 
1.9%
1968 63
 
1.9%
1989 63
 
1.9%
1999 63
 
1.9%
1988 61
 
1.8%
2001 59
 
1.8%
2002 58
 
1.8%
1990 58
 
1.8%
Other values (103) 2687
81.1%
ValueCountFrequency (%)
1900 53
1.6%
1901 8
 
0.2%
1902 11
 
0.3%
1903 3
 
0.1%
1904 14
 
0.4%
1905 9
 
0.3%
1906 18
 
0.5%
1907 31
0.9%
1908 27
0.8%
1909 31
0.9%
ValueCountFrequency (%)
2015 35
1.1%
2014 67
2.0%
2013 50
1.5%
2012 35
1.1%
2011 15
 
0.5%
2010 24
 
0.7%
2009 41
1.2%
2008 64
1.9%
2007 42
1.3%
2006 45
1.4%

BuildingType
Categorical

Distinct8
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
NonResidential
1439 
Multifamily LR (1-4)
996 
Multifamily MR (5-9)
578 
Multifamily HR (10+)
 
109
Nonresidential COS
 
84
Other values (3)
 
108

Length

Max length20
Median length20
Mean length17.166566
Min length6

Characters and Unicode

Total characters56890
Distinct characters40
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowNonResidential
2nd rowNonResidential
3rd rowNonResidential
4th rowNonResidential
5th rowNonResidential

Common Values

ValueCountFrequency (%)
NonResidential 1439
43.4%
Multifamily LR (1-4) 996
30.1%
Multifamily MR (5-9) 578
17.4%
Multifamily HR (10+) 109
 
3.3%
Nonresidential COS 84
 
2.5%
SPS-District K-12 83
 
2.5%
Campus 24
 
0.7%
Nonresidential WA 1
 
< 0.1%

Length

2023-06-20T15:11:59.677382image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-20T15:12:00.259134image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
multifamily 1683
24.6%
nonresidential 1524
22.3%
lr 996
14.5%
1-4 996
14.5%
mr 578
 
8.4%
5-9 578
 
8.4%
hr 109
 
1.6%
10 109
 
1.6%
cos 84
 
1.2%
sps-district 83
 
1.2%
Other values (3) 108
 
1.6%

Most occurring characters

ValueCountFrequency (%)
i 6580
 
11.6%
l 4890
 
8.6%
3534
 
6.2%
t 3373
 
5.9%
a 3231
 
5.7%
R 3122
 
5.5%
n 3048
 
5.4%
e 3048
 
5.4%
M 2261
 
4.0%
- 1740
 
3.1%
Other values (30) 22063
38.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 35904
63.1%
Uppercase Letter 8705
 
15.3%
Space Separator 3534
 
6.2%
Decimal Number 3532
 
6.2%
Dash Punctuation 1740
 
3.1%
Open Punctuation 1683
 
3.0%
Close Punctuation 1683
 
3.0%
Math Symbol 109
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 6580
18.3%
l 4890
13.6%
t 3373
9.4%
a 3231
9.0%
n 3048
8.5%
e 3048
8.5%
u 1707
 
4.8%
m 1707
 
4.8%
f 1683
 
4.7%
y 1683
 
4.7%
Other values (6) 4954
13.8%
Uppercase Letter
ValueCountFrequency (%)
R 3122
35.9%
M 2261
26.0%
N 1524
17.5%
L 996
 
11.4%
S 250
 
2.9%
H 109
 
1.3%
C 108
 
1.2%
O 84
 
1.0%
P 83
 
1.0%
D 83
 
1.0%
Other values (3) 85
 
1.0%
Decimal Number
ValueCountFrequency (%)
1 1188
33.6%
4 996
28.2%
5 578
16.4%
9 578
16.4%
0 109
 
3.1%
2 83
 
2.3%
Space Separator
ValueCountFrequency (%)
3534
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1740
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1683
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1683
100.0%
Math Symbol
ValueCountFrequency (%)
+ 109
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 44609
78.4%
Common 12281
 
21.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 6580
14.8%
l 4890
11.0%
t 3373
 
7.6%
a 3231
 
7.2%
R 3122
 
7.0%
n 3048
 
6.8%
e 3048
 
6.8%
M 2261
 
5.1%
u 1707
 
3.8%
m 1707
 
3.8%
Other values (19) 11642
26.1%
Common
ValueCountFrequency (%)
3534
28.8%
- 1740
14.2%
( 1683
13.7%
) 1683
13.7%
1 1188
 
9.7%
4 996
 
8.1%
5 578
 
4.7%
9 578
 
4.7%
0 109
 
0.9%
+ 109
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 56890
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 6580
 
11.6%
l 4890
 
8.6%
3534
 
6.2%
t 3373
 
5.9%
a 3231
 
5.7%
R 3122
 
5.5%
n 3048
 
5.4%
e 3048
 
5.4%
M 2261
 
4.0%
- 1740
 
3.1%
Other values (30) 22063
38.8%

Neighborhood
Categorical

Distinct19
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
DOWNTOWN
562 
EAST
448 
MAGNOLIA / QUEEN ANNE
415 
GREATER DUWAMISH
371 
NORTHEAST
274 
Other values (14)
1244 

Length

Max length22
Median length16
Mean length10.118286
Min length4

Characters and Unicode

Total characters33532
Distinct characters34
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowDOWNTOWN
2nd rowDOWNTOWN
3rd rowDOWNTOWN
4th rowDOWNTOWN
5th rowDOWNTOWN

Common Values

ValueCountFrequency (%)
DOWNTOWN 562
17.0%
EAST 448
13.5%
MAGNOLIA / QUEEN ANNE 415
12.5%
GREATER DUWAMISH 371
11.2%
NORTHEAST 274
8.3%
LAKE UNION 249
7.5%
NORTHWEST 208
 
6.3%
SOUTHWEST 157
 
4.7%
NORTH 142
 
4.3%
BALLARD 124
 
3.7%
Other values (9) 364
11.0%

Length

2023-06-20T15:12:00.795401image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
downtown 562
10.8%
east 448
 
8.6%
magnolia 415
 
8.0%
415
 
8.0%
queen 415
 
8.0%
anne 415
 
8.0%
greater 371
 
7.2%
duwamish 371
 
7.2%
northeast 274
 
5.3%
union 249
 
4.8%
Other values (9) 1245
24.0%

Most occurring characters

ValueCountFrequency (%)
N 4060
12.1%
E 3679
11.0%
A 3401
10.1%
T 3090
 
9.2%
O 2666
 
8.0%
1866
 
5.6%
W 1860
 
5.5%
S 1804
 
5.4%
R 1672
 
5.0%
U 1286
 
3.8%
Other values (24) 8148
24.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 30773
91.8%
Space Separator 1866
 
5.6%
Lowercase Letter 478
 
1.4%
Other Punctuation 415
 
1.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 4060
13.2%
E 3679
12.0%
A 3401
11.1%
T 3090
10.0%
O 2666
8.7%
W 1860
 
6.0%
S 1804
 
5.9%
R 1672
 
5.4%
U 1286
 
4.2%
H 1248
 
4.1%
Other values (9) 6007
19.5%
Lowercase Letter
ValueCountFrequency (%)
r 89
18.6%
t 89
18.6%
o 52
10.9%
h 52
10.9%
e 45
9.4%
l 44
9.2%
a 40
8.4%
n 26
 
5.4%
w 11
 
2.3%
s 11
 
2.3%
Other values (3) 19
 
4.0%
Space Separator
ValueCountFrequency (%)
1866
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 415
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 31251
93.2%
Common 2281
 
6.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 4060
13.0%
E 3679
11.8%
A 3401
10.9%
T 3090
9.9%
O 2666
8.5%
W 1860
 
6.0%
S 1804
 
5.8%
R 1672
 
5.4%
U 1286
 
4.1%
H 1248
 
4.0%
Other values (22) 6485
20.8%
Common
ValueCountFrequency (%)
1866
81.8%
/ 415
 
18.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 33532
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 4060
12.1%
E 3679
11.0%
A 3401
10.1%
T 3090
 
9.2%
O 2666
 
8.0%
1866
 
5.6%
W 1860
 
5.5%
S 1804
 
5.4%
R 1672
 
5.0%
U 1286
 
3.8%
Other values (24) 8148
24.3%
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.4 KiB
False
3185 
True
 
129
ValueCountFrequency (%)
False 3185
96.1%
True 129
 
3.9%
2023-06-20T15:12:01.265766image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.4 KiB
True
3313 
False
 
1
ValueCountFrequency (%)
True 3313
> 99.9%
False 1
 
< 0.1%
2023-06-20T15:12:01.643845image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.4 KiB
True
2090 
False
1224 
ValueCountFrequency (%)
True 2090
63.1%
False 1224
36.9%
2023-06-20T15:12:02.025421image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Distinct24
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
Low-Rise Multifamily
966 
Mid-Rise Multifamily
561 
Small- and Mid-Sized Office
288 
Other
250 
Warehouse
187 
Other values (19)
1062 

Length

Max length27
Median length22
Mean length17.213941
Min length5

Characters and Unicode

Total characters57047
Distinct characters43
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHotel
2nd rowHotel
3rd rowHotel
4th rowHotel
5th rowHotel

Common Values

ValueCountFrequency (%)
Low-Rise Multifamily 966
29.1%
Mid-Rise Multifamily 561
16.9%
Small- and Mid-Sized Office 288
 
8.7%
Other 250
 
7.5%
Warehouse 187
 
5.6%
Large Office 170
 
5.1%
Mixed Use Property 132
 
4.0%
K-12 School 123
 
3.7%
High-Rise Multifamily 104
 
3.1%
Retail Store 89
 
2.7%
Other values (14) 444
13.4%

Length

2023-06-20T15:12:02.422018image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
multifamily 1631
23.7%
low-rise 966
14.0%
mid-rise 561
 
8.1%
office 500
 
7.3%
small 288
 
4.2%
and 288
 
4.2%
mid-sized 288
 
4.2%
other 250
 
3.6%
warehouse 199
 
2.9%
large 170
 
2.5%
Other values (28) 1745
25.3%

Most occurring characters

ValueCountFrequency (%)
i 7501
 
13.1%
e 4476
 
7.8%
l 4346
 
7.6%
3572
 
6.3%
a 3002
 
5.3%
t 2755
 
4.8%
f 2671
 
4.7%
M 2651
 
4.6%
- 2358
 
4.1%
s 2154
 
3.8%
Other values (33) 21561
37.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 42326
74.2%
Uppercase Letter 8506
 
14.9%
Space Separator 3572
 
6.3%
Dash Punctuation 2358
 
4.1%
Decimal Number 246
 
0.4%
Other Punctuation 39
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 7501
17.7%
e 4476
10.6%
l 4346
10.3%
a 3002
 
7.1%
t 2755
 
6.5%
f 2671
 
6.3%
s 2154
 
5.1%
o 2056
 
4.9%
m 2048
 
4.8%
u 1979
 
4.7%
Other values (14) 9338
22.1%
Uppercase Letter
ValueCountFrequency (%)
M 2651
31.2%
R 1767
20.8%
L 1146
13.5%
S 967
 
11.4%
O 750
 
8.8%
W 268
 
3.2%
H 213
 
2.5%
U 157
 
1.8%
C 143
 
1.7%
P 132
 
1.6%
Other values (4) 312
 
3.7%
Decimal Number
ValueCountFrequency (%)
1 123
50.0%
2 123
50.0%
Space Separator
ValueCountFrequency (%)
3572
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2358
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 39
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 50832
89.1%
Common 6215
 
10.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 7501
14.8%
e 4476
 
8.8%
l 4346
 
8.5%
a 3002
 
5.9%
t 2755
 
5.4%
f 2671
 
5.3%
M 2651
 
5.2%
s 2154
 
4.2%
o 2056
 
4.0%
m 2048
 
4.0%
Other values (28) 17172
33.8%
Common
ValueCountFrequency (%)
3572
57.5%
- 2358
37.9%
1 123
 
2.0%
2 123
 
2.0%
/ 39
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 57047
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 7501
 
13.1%
e 4476
 
7.8%
l 4346
 
7.6%
3572
 
6.3%
a 3002
 
5.3%
t 2755
 
4.8%
f 2671
 
4.7%
M 2651
 
4.6%
- 2358
 
4.1%
s 2154
 
3.8%
Other values (33) 21561
37.8%

NumberofBuildings
Real number (ℝ)

SKEWED  ZEROS 

Distinct17
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.1071213
Minimum0
Maximum111
Zeros92
Zeros (%)2.8%
Negative0
Negative (%)0.0%
Memory size26.0 KiB
2023-06-20T15:12:02.834937image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile1
Maximum111
Range111
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.1243398
Coefficient of variation (CV)1.9187959
Kurtosis2174.733
Mean1.1071213
Median Absolute Deviation (MAD)0
Skewness43.115299
Sum3669
Variance4.5128198
MonotonicityNot monotonic
2023-06-20T15:12:03.241289image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
1 3123
94.2%
0 92
 
2.8%
2 36
 
1.1%
3 22
 
0.7%
4 12
 
0.4%
5 9
 
0.3%
6 5
 
0.2%
8 3
 
0.1%
10 2
 
0.1%
14 2
 
0.1%
Other values (7) 8
 
0.2%
ValueCountFrequency (%)
0 92
 
2.8%
1 3123
94.2%
2 36
 
1.1%
3 22
 
0.7%
4 12
 
0.4%
5 9
 
0.3%
6 5
 
0.2%
7 1
 
< 0.1%
8 3
 
0.1%
9 2
 
0.1%
ValueCountFrequency (%)
111 1
 
< 0.1%
27 1
 
< 0.1%
23 1
 
< 0.1%
16 1
 
< 0.1%
14 2
0.1%
11 1
 
< 0.1%
10 2
0.1%
9 2
0.1%
8 3
0.1%
7 1
 
< 0.1%

LargestPropertyUseTypeGFA
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct3085
Distinct (%)93.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79221.59
Minimum5656
Maximum9320156
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size26.0 KiB
2023-06-20T15:12:03.727826image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum5656
5-th percentile17521.8
Q125122.25
median39894
Q376799
95-th percentile244684.8
Maximum9320156
Range9314500
Interquartile range (IQR)51676.75

Descriptive statistics

Standard deviation202184.9
Coefficient of variation (CV)2.5521439
Kurtosis1324.495
Mean79221.59
Median Absolute Deviation (MAD)17574
Skewness30.224218
Sum2.6254035 × 108
Variance4.0878732 × 1010
MonotonicityNot monotonic
2023-06-20T15:12:04.355181image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24000 9
 
0.3%
22000 8
 
0.2%
30000 8
 
0.2%
21600 7
 
0.2%
20000 7
 
0.2%
28800 6
 
0.2%
15000 5
 
0.2%
24288 5
 
0.2%
36000 5
 
0.2%
45000 5
 
0.2%
Other values (3075) 3249
98.0%
ValueCountFrequency (%)
5656 1
< 0.1%
6455 1
< 0.1%
6601 1
< 0.1%
6900 1
< 0.1%
7245 1
< 0.1%
7387 1
< 0.1%
7501 1
< 0.1%
7583 1
< 0.1%
7758 1
< 0.1%
8061 1
< 0.1%
ValueCountFrequency (%)
9320156 1
< 0.1%
1719643 1
< 0.1%
1680937 1
< 0.1%
1639334 1
< 0.1%
1585960 1
< 0.1%
1350182 1
< 0.1%
1314475 1
< 0.1%
1191115 1
< 0.1%
1172127 1
< 0.1%
1011135 1
< 0.1%

TotalGHGEmissions
Real number (ℝ)

Distinct2782
Distinct (%)83.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean120.62974
Minimum0.4
Maximum16870.98
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size26.0 KiB
2023-06-20T15:12:05.031057image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0.4
5-th percentile3.9365
Q19.71
median34.28
Q394.2275
95-th percentile393.1155
Maximum16870.98
Range16870.58
Interquartile range (IQR)84.5175

Descriptive statistics

Standard deviation542.80206
Coefficient of variation (CV)4.4997366
Kurtosis468.37182
Mean120.62974
Median Absolute Deviation (MAD)28.13
Skewness19.357192
Sum399766.96
Variance294634.07
MonotonicityNot monotonic
2023-06-20T15:12:05.693156image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.95 7
 
0.2%
4.2 6
 
0.2%
4.02 5
 
0.2%
3.63 5
 
0.2%
4.76 5
 
0.2%
6.18 5
 
0.2%
5.07 5
 
0.2%
4.15 5
 
0.2%
4.8 5
 
0.2%
9.29 5
 
0.2%
Other values (2772) 3261
98.4%
ValueCountFrequency (%)
0.4 1
< 0.1%
0.63 1
< 0.1%
0.68 1
< 0.1%
0.75 1
< 0.1%
0.79 1
< 0.1%
0.81 1
< 0.1%
0.82 1
< 0.1%
0.86 1
< 0.1%
0.87 1
< 0.1%
0.89 1
< 0.1%
ValueCountFrequency (%)
16870.98 1
< 0.1%
12307.16 1
< 0.1%
11140.56 1
< 0.1%
10734.57 1
< 0.1%
8145.52 1
< 0.1%
6330.91 1
< 0.1%
4906.33 1
< 0.1%
3995.45 1
< 0.1%
3768.66 1
< 0.1%
3278.11 1
< 0.1%

SiteEnergyUse_kBtu_
Real number (ℝ)

HIGH CORRELATION  SKEWED  UNIQUE 

Distinct3314
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5450556.3
Minimum57133.199
Maximum8.7392371 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size26.0 KiB
2023-06-20T15:12:06.346146image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum57133.199
5-th percentile520937.92
Q1943547.03
median1821625
Q34232943.2
95-th percentile18251998
Maximum8.7392371 × 108
Range8.7386658 × 108
Interquartile range (IQR)3289396.2

Descriptive statistics

Standard deviation21773595
Coefficient of variation (CV)3.9947473
Kurtosis847.3068
Mean5450556.3
Median Absolute Deviation (MAD)1073179.6
Skewness24.698505
Sum1.8063144 × 1010
Variance4.7408943 × 1014
MonotonicityNot monotonic
2023-06-20T15:12:06.914130image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7226362.5 1
 
< 0.1%
6714540 1
 
< 0.1%
1148202.25 1
 
< 0.1%
876569.6875 1
 
< 0.1%
488991.5 1
 
< 0.1%
1206165.75 1
 
< 0.1%
1302192.875 1
 
< 0.1%
150167.7969 1
 
< 0.1%
1386445.375 1
 
< 0.1%
1331469.75 1
 
< 0.1%
Other values (3304) 3304
99.7%
ValueCountFrequency (%)
57133.19922 1
< 0.1%
79711.79688 1
< 0.1%
90558.70313 1
< 0.1%
97690.39844 1
< 0.1%
106918 1
< 0.1%
111969.7031 1
< 0.1%
113130 1
< 0.1%
116486.6016 1
< 0.1%
117438.3984 1
< 0.1%
123767.2031 1
< 0.1%
ValueCountFrequency (%)
873923712 1
< 0.1%
448385312 1
< 0.1%
293090784 1
< 0.1%
291614432 1
< 0.1%
274682208 1
< 0.1%
253832464 1
< 0.1%
163945984 1
< 0.1%
143423024 1
< 0.1%
131373880 1
< 0.1%
114648520 1
< 0.1%
Distinct3085
Distinct (%)93.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.6776795
Minimum3.7525094
Maximum6.9694232
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size26.0 KiB
2023-06-20T15:12:07.536191image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum3.7525094
5-th percentile4.2435787
Q14.4000585
median4.6009076
Q34.8853552
95-th percentile5.3886068
Maximum6.9694232
Range3.2169138
Interquartile range (IQR)0.48529669

Descriptive statistics

Standard deviation0.37089589
Coefficient of variation (CV)0.079290573
Kurtosis1.3489038
Mean4.6776795
Median Absolute Deviation (MAD)0.22067407
Skewness1.0256002
Sum15501.83
Variance0.13756376
MonotonicityNot monotonic
2023-06-20T15:12:08.276461image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4.380211242 9
 
0.3%
4.342422681 8
 
0.2%
4.477121255 8
 
0.2%
4.334453751 7
 
0.2%
4.301029996 7
 
0.2%
4.459392488 6
 
0.2%
4.176091259 5
 
0.2%
4.385391754 5
 
0.2%
4.556302501 5
 
0.2%
4.653212514 5
 
0.2%
Other values (3075) 3249
98.0%
ValueCountFrequency (%)
3.752509401 1
< 0.1%
3.809896247 1
< 0.1%
3.819609733 1
< 0.1%
3.838849091 1
< 0.1%
3.86003839 1
< 0.1%
3.868468099 1
< 0.1%
3.875119165 1
< 0.1%
3.879841056 1
< 0.1%
3.889749775 1
< 0.1%
3.906388921 1
< 0.1%
ValueCountFrequency (%)
6.969423182 1
< 0.1%
6.235438296 1
< 0.1%
6.225551437 1
< 0.1%
6.214667446 1
< 0.1%
6.20029223 1
< 0.1%
6.130392314 1
< 0.1%
6.118752331 1
< 0.1%
6.075953694 1
< 0.1%
6.06897467 1
< 0.1%
6.004809144 1
< 0.1%

TotalGHGEmissions_log
Real number (ℝ)

Distinct2782
Distinct (%)83.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.5286686
Minimum-0.39794001
Maximum4.2271403
Zeros0
Zeros (%)0.0%
Negative10
Negative (%)0.3%
Memory size26.0 KiB
2023-06-20T15:12:09.210587image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum-0.39794001
5-th percentile0.59510994
Q10.98721923
median1.5350407
Q31.9741777
95-th percentile2.5945202
Maximum4.2271403
Range4.6250803
Interquartile range (IQR)0.98695844

Descriptive statistics

Standard deviation0.64537739
Coefficient of variation (CV)0.42218267
Kurtosis-0.17943767
Mean1.5286686
Median Absolute Deviation (MAD)0.4830969
Skewness0.27622109
Sum5066.0078
Variance0.41651198
MonotonicityNot monotonic
2023-06-20T15:12:09.778267image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.5965970956 7
 
0.2%
0.6232492904 6
 
0.2%
0.6042260531 5
 
0.2%
0.559906625 5
 
0.2%
0.6776069527 5
 
0.2%
0.7909884751 5
 
0.2%
0.7050079593 5
 
0.2%
0.6180480967 5
 
0.2%
0.6812412374 5
 
0.2%
0.968015714 5
 
0.2%
Other values (2772) 3261
98.4%
ValueCountFrequency (%)
-0.3979400087 1
< 0.1%
-0.2006594505 1
< 0.1%
-0.1674910873 1
< 0.1%
-0.1249387366 1
< 0.1%
-0.1023729087 1
< 0.1%
-0.09151498112 1
< 0.1%
-0.08618614762 1
< 0.1%
-0.06550154876 1
< 0.1%
-0.06048074738 1
< 0.1%
-0.05060999336 1
< 0.1%
ValueCountFrequency (%)
4.227140311 1
< 0.1%
4.090157847 1
< 0.1%
4.046907022 1
< 0.1%
4.030784652 1
< 0.1%
3.910918814 1
< 0.1%
3.80146614 1
< 0.1%
3.690756756 1
< 0.1%
3.6015657 1
< 0.1%
3.576186958 1
< 0.1%
3.515623523 1
< 0.1%

SiteEnergyUse_kBtu_log
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct3314
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.3401237
Minimum4.7568885
Maximum8.9414735
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size26.0 KiB
2023-06-20T15:12:10.372094image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum4.7568885
5-th percentile5.716786
Q15.9747635
median6.2604589
Q36.6266424
95-th percentile7.2613093
Maximum8.9414735
Range4.184585
Interquartile range (IQR)0.65187897

Descriptive statistics

Standard deviation0.49374055
Coefficient of variation (CV)0.07787554
Kurtosis0.95099165
Mean6.3401237
Median Absolute Deviation (MAD)0.31456573
Skewness0.79560926
Sum21011.17
Variance0.24377973
MonotonicityNot monotonic
2023-06-20T15:12:10.893524image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6.858919744 1
 
< 0.1%
6.827016265 1
 
< 0.1%
6.060018394 1
 
< 0.1%
5.942786448 1
 
< 0.1%
5.68930131 1
 
< 0.1%
6.081406992 1
 
< 0.1%
6.114675315 1
 
< 0.1%
5.176576809 1
 
< 0.1%
6.141902763 1
 
< 0.1%
6.124331304 1
 
< 0.1%
Other values (3304) 3304
99.7%
ValueCountFrequency (%)
4.756888543 1
< 0.1%
4.901522599 1
< 0.1%
4.956930194 1
< 0.1%
4.989851881 1
< 0.1%
5.029050826 1
< 0.1%
5.049100527 1
< 0.1%
5.053577787 1
< 0.1%
5.066275975 1
< 0.1%
5.06981012 1
< 0.1%
5.092605577 1
< 0.1%
ValueCountFrequency (%)
8.941473523 1
< 0.1%
8.651651378 1
< 0.1%
8.467002163 1
< 0.1%
8.464809013 1
< 0.1%
8.43883053 1
< 0.1%
8.404547166 1
< 0.1%
8.214700783 1
< 0.1%
8.156618875 1
< 0.1%
8.118509027 1
< 0.1%
8.059368453 1
< 0.1%

Interactions

2023-06-20T15:11:53.056521image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:28.132070image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:31.983624image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:35.491065image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:38.857639image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:42.584023image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:45.965404image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:49.559909image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:53.527816image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:28.666303image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:32.423047image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:35.912472image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:39.363462image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:43.008177image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:46.417069image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:50.104254image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:54.186648image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:29.121570image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:32.833071image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:36.335198image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:39.816775image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:43.452215image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:46.847008image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:50.509409image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:54.594802image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:29.589850image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:33.272969image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:36.732755image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:40.278497image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:43.881975image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:47.290591image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:50.911181image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:55.050506image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:30.073783image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:33.761985image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:37.180968image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:40.754868image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:44.314169image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:47.771334image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:51.350627image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:55.449261image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:30.712429image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:34.182716image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:37.577720image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:41.243948image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:44.712422image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:48.208598image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:51.786143image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:55.874754image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:31.158743image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:34.637978image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:38.014853image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:41.705583image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:45.160246image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:48.695988image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:52.239671image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:56.271382image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:31.578288image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:35.067899image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:38.429426image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:42.126387image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:45.559515image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:49.128540image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-20T15:11:52.647727image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2023-06-20T15:12:11.337197image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
YearBuiltNumberofBuildingsLargestPropertyUseTypeGFATotalGHGEmissionsSiteEnergyUse_kBtu_LargestPropertyUseTypeGFA_logTotalGHGEmissions_logSiteEnergyUse_kBtu_logBuildingTypeNeighborhoodHave_Stream_EnergyHave_Electricity_EnergyHave_NaturalGas_EnergyPrimaryPropertyType
YearBuilt1.0000.0380.2910.0270.1610.2910.0270.1610.1580.1760.1570.0260.3380.186
NumberofBuildings0.0381.0000.0830.0490.0550.0830.0490.0550.2400.0000.0810.0000.0000.136
LargestPropertyUseTypeGFA0.2910.0831.0000.5710.7361.0000.5710.7360.1480.0150.1470.0000.0240.230
TotalGHGEmissions0.0270.0490.5711.0000.8780.5711.0000.8780.1260.0000.1980.0000.0340.259
SiteEnergyUse_kBtu_0.1610.0550.7360.8781.0000.7360.8781.0000.1550.0000.1270.0000.0230.276
LargestPropertyUseTypeGFA_log0.2910.0831.0000.5710.7361.0000.5710.7360.1960.0860.2310.0000.1670.285
TotalGHGEmissions_log0.0270.0490.5711.0000.8780.5711.0000.8780.2160.0950.3530.0310.6910.315
SiteEnergyUse_kBtu_log0.1610.0550.7360.8781.0000.7360.8781.0000.2390.1150.2880.0000.3520.353
BuildingType0.1580.2400.1480.1260.1550.1960.2160.2391.0000.2030.1950.0000.2890.729
Neighborhood0.1760.0000.0150.0000.0000.0860.0950.1150.2031.0000.2850.0000.1540.200
Have_Stream_Energy0.1570.0810.1470.1980.1270.2310.3530.2880.1950.2851.0000.0000.0140.295
Have_Electricity_Energy0.0260.0000.0000.0000.0000.0000.0310.0000.0000.0000.0001.0000.0000.276
Have_NaturalGas_Energy0.3380.0000.0240.0340.0230.1670.6910.3520.2890.1540.0140.0001.0000.355
PrimaryPropertyType0.1860.1360.2300.2590.2760.2850.3150.3530.7290.2000.2950.2760.3551.000

Missing values

2023-06-20T15:11:56.890625image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-06-20T15:11:57.809603image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

YearBuiltBuildingTypeNeighborhoodHave_Stream_EnergyHave_Electricity_EnergyHave_NaturalGas_EnergyPrimaryPropertyTypeNumberofBuildingsLargestPropertyUseTypeGFATotalGHGEmissionsSiteEnergyUse_kBtu_LargestPropertyUseTypeGFA_logTotalGHGEmissions_logSiteEnergyUse_kBtu_log
01927NonResidentialDOWNTOWNTrueTrueTrueHotel188434.0249.987226362.54.9466192.3979056.858920
11996NonResidentialDOWNTOWNFalseTrueTrueHotel183880.0295.868387933.04.9236582.4710866.923655
21969NonResidentialDOWNTOWNTrueTrueTrueHotel1756493.02089.2872587024.05.8788053.3199977.860859
31926NonResidentialDOWNTOWNTrueTrueTrueHotel161320.0286.436794584.04.7876022.4570196.832163
41980NonResidentialDOWNTOWNFalseTrueTrueHotel1123445.0505.0114172606.05.0914742.7033007.151450
51999Nonresidential COSDOWNTOWNFalseTrueTrueOther188830.0301.8112086616.04.9485602.4797347.082305
61926NonResidentialDOWNTOWNFalseTrueTrueHotel181352.0176.145758795.04.9103682.2458586.760332
71926NonResidentialDOWNTOWNTrueTrueTrueOther1102761.0221.516298131.55.0118282.3453936.799212
81904NonResidentialDOWNTOWNFalseTrueTrueHotel1163984.0392.1613723820.05.2148012.5934637.137475
91910Multifamily MR (5-9)DOWNTOWNTrueTrueTrueMid-Rise Multifamily156132.0151.124573777.04.7492112.1793226.660275
YearBuiltBuildingTypeNeighborhoodHave_Stream_EnergyHave_Electricity_EnergyHave_NaturalGas_EnergyPrimaryPropertyTypeNumberofBuildingsLargestPropertyUseTypeGFATotalGHGEmissionsSiteEnergyUse_kBtu_LargestPropertyUseTypeGFA_logTotalGHGEmissions_logSiteEnergyUse_kBtu_log
33041952Nonresidential COSMAGNOLIA / QUEEN ANNEFalseTrueFalseOffice113661.03.505.026677e+054.1354820.5440685.701281
33051912Nonresidential COSEASTFalseTrueTrueOther123445.0259.225.976246e+064.3700502.4136696.776428
33061994Nonresidential COSCENTRALFalseTrueTrueMixed Use Property18108.060.811.813404e+063.9089141.7839756.258495
33071960Nonresidential COSSOUTHEASTFalseTrueTrueOffice115398.07.793.878100e+054.1874640.8915375.588619
33081982Nonresidential COSDELRIDGE NEIGHBORHOODSFalseTrueTrueOther118261.020.339.320821e+054.2615251.3081375.969454
33091990Nonresidential COSGREATER DUWAMISHFalseTrueTrueOffice112294.020.948.497457e+054.0896931.3209775.929289
33102004Nonresidential COSDOWNTOWNFalseTrueTrueOther116000.032.179.502762e+054.2041201.5074515.977850
33111974Nonresidential COSMAGNOLIA / QUEEN ANNEFalseTrueTrueOther17583.0223.545.765898e+063.8798412.3493556.760867
33121989Nonresidential COSGREATER DUWAMISHFalseTrueTrueMixed Use Property16601.022.117.194712e+053.8196101.3445895.857013
33131938Nonresidential COSGREATER DUWAMISHFalseTrueTrueMixed Use Property18271.041.271.152896e+063.9175581.6156346.061790